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Low-swing Bus Driver and Receiver 

BACKGROUND 

The performance of conventional microprocessors may be limited by RC 
characteristics of on-chip interconnects. These characteristics may delay signals that are 
5 transmitted over the interconnects. For example, the effective coupling capacitance of a 
signal line is equal to C c multiplied by a Coupling Capacitance Multiplier (CCM). The 
CCM for a particular signal line is dependent upon the relative directions of signal 
transitions within the particular signal line and within a neighboring line. If the particular 
signal line carries a signal transition from a first signal level to a second signal level, CCM 
10 for the signal line is 1 if the neighboring line does not carry a signal transition, 0 if the 
neighboring line carries a signal transition from the first signal level to the second signal 
level, and 2 if the neighboring line carries a signal transition from the second signal level to 
the first signal level. 

FIG. 1 illustrates a conventional static bus architecture for the purpose of explaining 
1 5 capacitive effects that result from adjacent signal transitions on neighboring signal lines. 
Bus 1 includes signal paths 10, 20 and 30. Signal path 10 comprises driver flip-flop 11, 
receiver flip-flop 12 and repeaters 13 through 16 connected serially therebetween. 
Repeaters 13 through 16 are intended to reduce signal delays caused by path 10 by creating 
a linear relationship between the length of signal path 10 and the signal delay associated 
20 therewith. Moreover, repeaters 13 through 16 are inverters that convert a received signal of 
a first signal level to an output signal of a second signal level. Signal paths 20 and 30 are 
constructed similarly to signal path 10. 

FIG. 2 is a timing diagram illustrating signals on signal paths 10, 20 and 30 of bus 1. 
The diagram assumes that the bit values "1", "0" and "1" are to be transmitted over signal 
25 paths 10, 20 and 30, respectively. As shown, each of these values initially undergoes a 
transition between time t| and t 2 due to a respective one of repeaters 13, 23 and 33. In 
particular, repeater 23 converts the signal on path 20 from a low signal level to a high signal 
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level and repeaters 13 and 33 convert the signals on paths 10 and 30 from a high signal level 
to a low signal level Accordingly, CCM of signal path 20 relative to signal path 10 is 2, 
and relative to signal path 30 is also 2. In addition, transitions occurring between times t3 
and U, t 5 and t6, and t 7 and tg each result in a CCM of 2 for signal path 20 relative to signal 
5 path 1 0, and a CCM of 2 for signal path 20 relative to signal path 30. The resulting impact 
on worst-case delay, energy and peak supply current often renders the architecture of bus 1 
unsuitable. 

The delay of a bus can be improved by avoiding the worst-case situation of a CCM 
that is equal to 2. One approach uses a dynamic bus, in which bus segments pre-charge 
10 during one clock phase and conditionally evaluate in the next phase. Such a dynamic bus 
provides a worst-case CCM of 1 because all bus segments pre-charge and evaluate in a same 
direction. However, dynamic buses require additional clock routing and power for pre- 
charging even in the absence of input switching activity. The addition of aggressively-sized 
repeaters within such a bus may also contribute significantly to power consumption. 

1 5 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a logical diagram of a conventional bus. 
FIG. 2 is a timing diagram of signals on a conventional bus. 
FIG. 3 is a schematic diagram of a driver according to some embodiments. 
FIG. 4 is a schematic diagram of a receiver according to some embodiments. 
20 FIG. 5 is a schematic diagram of a receiver according to some embodiments. 

FIG. 6 is a schematic diagram of a driver according to some embodiments. 
FIG. 7 is a block diagram of a system according to some embodiments. 
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DETAILED DESCRIPTION 

In the following description, particular types of circuits and circuit elements are 
described for purposes of illustration. Other embodiments, however, may utilize other types 
of circuits. Further, although complementary metal-oxide semiconductor (CMOS) 
5 transistors are referred to in the illustrations that follow, it will be appreciated by those of 
ordinary skill in the art that some embodiments may be implemented using various other 
types of processing technologies. 

FIG. 3 is a schematic diagram of system 100 according to some embodiments. 
System 100 comprises static low-swing driver circuit 110, interconnect 120 and dynamic 
10 receiver circuit 130. Static low-swing driver circuit 1 10 may receive a full-swing input 

signal, convert the full-swing input signal to a low-swing signal, and transmit the low-swing 
signal. Additionally, dynamic receiver circuit 130 may receive the low-swing signal and 
convert the low-swing signal to a full-swing signal. 

System 100 may be used in any suitable implementation, including but not limited to 
15 an on-chip communication bus. In this regard, system 100 may comprise one bit-line of a 
multi-line communication bus. According to some embodiments, a communication bus 
comprises 256 parallel instances of system 100. 

Driver circuit 110 comprises input line 1 1 1 coupled to inverter 1 12. Inverter 1 12 
includes p-channel metal-oxide semiconductor (PMOS) transistor Ml and n-channel metal- 
20 oxide semiconductor (NMOS) transistor M2. As shown, a gate of transistor M2 is coupled 
to a gate of transistor Ml at input line 1 1 1, a source of transistor Ml is coupled to a supply 
voltage (V cc ), a drain of transistor Ml is coupled to a drain of transistor M2, and a source of 
transistor M2 is coupled to V ss . 

Input line 1 1 1 is also coupled to an input of delay element 113, which comprises 
25 PMOS transistor M3 and NMOS transistor M4. More specifically, delay element 113 

comprises a pass gate, with a gate of transistor M3 coupled to V ss and a gate of transistor M4 
coupled to V cc . Moreover, the drains of transistors M3 and M4 are coupled to one another, 
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as are the sources of transistors M3 and M4. In some embodiments, a propagation delay 
associated with delay element 1 13 is matched to a propagation delay associated with inverter 
112. 

Driver 1 10 further comprises NMOS transistor M5 and NMOS transistor M6. A 
5 gate of transistor M5 is coupled to an output of inverter 112 and a drain of transistor M5 is 
coupled to a voltage Vhi that is less than Vcc- A gate of transistor M6 is coupled to an 
output of delay element 113 and a source of transistor M6 is coupled to V ss . A drain of 
transistor M6 is coupled to a source of transistor M5 at output line 1 14 of driver 110. 
Output line 1 14 is in turn coupled to interconnect 120. 

10 In one example of operation, driver 1 10 converts a full-swing input signal to a low- 

swing signal. More specifically, an input signal of "0" turns transistor M5 on and turns 
transistor M6 off after a propagation delay associated with inverter 1 12 and delay element 
113. Since transistor M5 is coupled to V H i and not to V C c, a low-voltage representation of a 
"1" is transmitted to interconnect 120. If the input signal is "1", M6 is turned on and M5 is 

15 turned off, resulting in the transmission of a low- voltage representation of a "0". The 
transmitted low-swing signals may be inverted when received to directly represent their 
corresponding input signals. 

Transmission of a low-voltage signal along interconnect 120 may provide reduced 
capacitive coupling between adjacent interconnects within a bus that is composed of several 

20 instances of system 100. In some embodiments, interconnect 120 does not comprise a 

repeater, which may provide power and die area savings in comparison to systems using bus 
repeaters. According to some embodiments, interconnect 120 is not pre-charged and 
evaluated according to dynamic bus protocols. Such embodiments may provide power 
savings, particularly in the absence of switching activity, over some dynamic bus-based 

25 systems. 

FIG. 4 is a schematic diagram of receiver 200 according to some embodiments. 
Receiver 200 may be used to implement dynamic receiver circuit 130 of system 100. 
Receiver 200 comprises a true single phase clock-style positive edge-triggered level- 
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restoring flip-flop according to some embodiments. Receiver 200 may operate to receive a 
low-swing signal from interconnect 120 and to convert the low-swing signal to a full-swing 
signal. 

Receiver 200 comprises NMOS transistor M7 5 PMOS transistor M8, and PMOS 
5 transistor M9. Respective gates of transistors M7 and M9 are coupled to input line 201, 
which is in turn coupled to interconnect 120. A source of transistor M9 is coupled to Vhi, 
and a source of transistor M7 is coupled to V S s- A drain of transistor M9 is coupled to a 
source of transistor M8 3 and a drain of transistor M8 is coupled to a drain of transistor M7. 
Transistor M8 receives a clock signal at its gate. 

10 Node NO is located at the coupling of transistors M7 and M8. Also coupled to node 

NO is a drain of PMOS transistor M10. A source of transistor M10 is coupled to Vcc- Node 
NO is also coupled to a gate of NMOS transistor Mil. 

A source of transistor Ml 1 is coupled to a drain of NMOS transistor Ml 2, whose 
source is coupled to Vss- A drain of transistor Ml 1 is coupled to a drain of PMOS transistor 
15 M13 at node Nl, and a source of transistor M13 is coupled to Vcc- Both gates of transistors 
M12 and Ml 3 are coupled to the clock signal. 

Node Nl is coupled to the gates of NMOS transistor Ml 4 and PMOS transistor Ml 5. 
A source of transistor Ml 5 is coupled to V C c 5 and a source of transistor M14 is coupled to 
Vss- A drain of transistor Ml 5 is coupled to a drain of NMOS transistor Ml 6, and a source 
20 of transistor Ml 6 is coupled to a drain of transistor Ml 4. The clock signal is coupled to a 
gate of transistor M 1 6. 

Node N2 is located at the coupling of transistors Ml 5 and Ml 6. Also coupled to 
node N2 are the gates of NMOS transistor Ml 7 and PMOS transistor Ml 8. A source of 
transistor Ml 8 is coupled to V C c> and a source of transistor Ml 7 is coupled to Vss- The 
25 drains of transistors Ml 7 and Ml 8 meet at the output of receiver 200. 

During a pre-charge phase, node Nl is coupled to V C c via transistor Ml 3 and is 
therefore pre-charged to "1". Transistor M10 is therefore off because its gate is coupled to 
node Nl . Node NO is therefore charged based on the signal on input line 201 . More 
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particularly, the signal on input line 201 is inverted during pre-charge by virtue of an 
inverter formed by transistors M7 and M9, and the inverted signal is reflected at node NO. 

The clock signal goes high during the evaluation phase of receiver 200, therefore 
transistor M8 turns off and the value at node NO is held dynamically. In particular, node Nl 
5 discharges through transistors Ml 1 and Ml 2 if a "1" was stored at node NO during the pre- 
charge phase, which turns on transistor M10 and pulls node NO to V C o If a "0" was stored 
at node NO during the pre-charge phase, then transistors Ml 1, M13, M16 and M15 are off, 
causing node Nl to remain at "1" and node NO to remain at "0". 

Transistor Ml 6 is turned on during the evaluation phase, therefore the value of node 
10 Nl is inverted to node N2 and then inverted again by the inverter composed of transistors 
Ml 7 and Ml 8. The signal on input line 201 is therefore inverted four times and converted 
to a full-swing signal before being output by receiver 200. 

By utilizing pre-charging of one internal node according to some embodiments, 
receiver 200 may provide a system that reduces bus power requirements in comparison to 
1 5 some dynamic bus systems. 

FIG. 5 is a schematic diagram of receiver 300 according to some embodiments. 
Receiver 300 may be used to implement dynamic receiver circuit 130 of system 100. 
Receiver 300 comprises a positive edge-triggered dynamic sense-amplifying flip-flop 
according to some embodiments. Receiver 300 may operate to receive a low-swing signal 
20 from interconnect 120 and to convert the low-swing signal to a full-swing signal. 

Input line 301 is coupled to interconnect 120 and to a gate of NMOS transistor Ml 9 
according to some embodiments. Input line 301 is also coupled to an input of inverter II, 
which is supplied by voltages V H i and V S s because these voltages are the voltages based on 
which driver 1 10 generates a low-swing signal that is received by receiver 300. 

25 An output of inverter is coupled to a gate of NMOS transistor M20. The sources of 

transistors Ml 9 and M20 are both coupled to V S s- The drains of transistors Ml 9 and M20 
are coupled to a drain and a source of NMOS transistor M21 at node N3 and node N4 
respectively, with a gate of transistor M21 being coupled to V C c- Sources of NMOS 
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transistors M22 and M23 are also respectively coupled to nodes N3 and N4, and the gates of 
transistors M22 and M23 are coupled to a clock signal. 

A drain of transistor M23 is coupled to a drain of transistor M24 and to a drain of 
transistor M25 at node N5. A gate of transistor M23 is coupled to a gate of transistor M24 
5 and a gate of transistor M25 is coupled to the clock signal. Sources of transistors M24 and 
M25 are coupled to Vcc- 

A drain of transistor M26 is coupled to a drain of transistor M27 and to a drain of 
transistor M28 at node N6. A gate of transistor M26 is coupled to a gate of transistor M27 
and a gate of transistor M28 is coupled to the clock signal. Sources of transistors M27 and 
10 M28 are coupled to V C o The gate of transistor M23 and the gate of transistor M24 are also 
coupled to node N6, and the gate of transistor M26 and the gate of transistor M27 are also 
coupled to node N5. Node N5 and node N6 reflect the values of output stage control signals 
S, Sb and R 5 Rb, respectively. 

Node N5 is coupled to an input of inverter 12 and to a gate of PMOS transistor M29. 
1 5 Node N6 is coupled to an input of inverter 13 and to a gate of PMOS transistor M30. The 
sources of transistors M29 and M30 are coupled to V C o The drain of transistor M29 is 
coupled to output line Q, to an input of inverter 14, to an output of inverter 15, and to a drain 
of NMOS transistor M3 1 . The drain of transistor M30 is coupled to output line Qb, to an 
output of inverter 14, to an input of inverter 15, and to a drain of NMOS transistor M32. 

20 The sources of transistors M3 1 and M32 are coupled to V S s. A gate of transistor 

M3 1 is coupled to an output of inverter 13, and a gate of transistor M32 is coupled to an 
output of inverter 12. 

In operation, nodes N5 and N6 are pre-charged to "1" through transistors M25 and 
M28 when the clock signal is low. These values turn off transistors M29 and M30 and also 
25 turn off transistors M3 1 and M32 after passing through inverters 12 and 13, respectively. 
Accordingly, cross-coupled inverters 14 and 15 hold a previous state on output lines Q and 
Qb during the pre-charge phase. Also during the pre-charge phase, the low clock signal 
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turns off transistors M21 and M22 to isolate the value on input line 301 from the output 
stage. 

The clock signal goes high during the evaluation phase and turns on transistors M21 
and M22, causing either node N5 or node N6 to discharge depending on the value of the 
5 input signal on input line 301. More specifically, if the input signal is a low-swing "0" 
during the evaluation phase, transistor Ml 9 will be turned off and transistor M20 will be 
turned on. Node N6 will therefore discharge to "0" through transistors M22 and M20, and 
node N5 will remain at "1". Control signals R and Rb will therefore be set to "1" and "0", 
thereby discharging Q through transistor M31 and charging Qb through transistor M30. 

10 According to some embodiments, one of node N3 and node N4 is left in a high 

impedance state with a "0" present on the node if the input signal changes during the 
evaluation phase. Leakage currents may then charge the one node and cause the latch to flip 
state before a next rising edge of the clock signal. 

Leakage in inverter II may be reduced by virtue of its connection to the same 
15 reduced-window supply voltages that are used by a driver from which receiver 300 receives 
a signal. In this regard, FIG. 6 is a schematic diagram of driver 400 that may be used in 
conjunction with receiver 300 according to some embodiments. Driver 400 comprises 
inverter 401 supplied by reduced-window supply voltages V H i and V L o. Voltage V L0 is 
greater than V S s used by driver 110. As a result, driver 400 may generate output signals 
20 having a smaller voltage range than those transmitted by driver 1 10. FIG. 6 shows driver 
400 coupled to receiver 300 via interconnect 120. In some embodiments, receiver 300 of 
FIG. 6 is identical to receiver 300 of FIG. 5 except that inverter II is coupled to voltage V L o 
instead of to voltage Vss- 

FIG. 7 illustrates a block diagram of system 500 according to some embodiments. 
25 System 500 includes integrated circuit 502 which may be a microprocessor or another type 
of integrated circuit. Integrated circuit 502 comprises sub-blocks such as arithmetic logic 
unit (ALU) 504 and on-die cache 506, which communicate with one another via bus 508. 
According to some embodiments, bus 508 comprises multiple instances of interconnects 120 
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as described above, and ALU 504 and cache 506 each include instances of driver 1 10 and 
receiver 130 for transmitting and receiving data to one another over bus 508. 

Integrated circuit 502 may communicate with off-die cache 510. Integrated circuit 
502 may also communicate with system memory 512 via a host bus and chipset 514. 
5 Communication between integrated circuit 502 and off-die cache 510 and/or chipset 514 
may proceed over a system such as system 100. System memory 512 may comprise any 
type of memory for storing data, such as a Single Data Rate Random Access Memory, a 
Double Data Rate Random Access Memory, or a Programmable Read Only Memory. Other 
off-die functional units, such as graphics controller 516 and Network Interface Controller 
10 (NIC) 518, may communicate with integrated circuit 502 via appropriate busses or ports. 

The several embodiments described herein are solely for the purpose of illustration. 
Embodiments may include any currently or hereafter-known versions of the elements 
described herein. Therefore, persons skilled in the art will recognize from this description 
that other embodiments may be practiced with various modifications and alterations. 
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